Structure of Phonetic Categories
نویسندگان
چکیده
Recently, speech researchers have begun to examine the formation of speech sound (phonetic) categories and to analyze the internal structure of the consequent categories. One of the most prominent products of this subfield has been the Perceptual Magnet Effect (PME) and the attendant Native Language Magnet (NLM) theory of Kuhl (1991, 2000). In the present paper, a critical review of the evidence for NLM is offered. Because of concerns about the nature of the stimuli, possible confounds inherent in the empirical procedures and failed replications, it is concluded that there is little positive evidence supporting NLM. However, the goal of uncovering the structures of phonetic categories and mechanisms responsible for those structures remains central to an understanding of language acquisition and speech perception more generally. Data from several empirical paradigms investigating the formation and structure of complex auditory categories are beginning to form a coherent picture of phonetic category acquisition. Structure of Phonetic Categories 3 Structure and Function in the Acquisition of Phonetic Categories: Fingerprints of the Learning Process Given the complex nature of the speech signal, the task for the competent language user is quite daunting. Some of the variance in the acoustic pattern of the speech signal is directly relevant to the linguistic message. This includes the acoustic features defining phonemes, syllables and words that are meaningful for the particular language being spoken. Other portions of the variance are unrelated to the specific linguistic structure of the message. Even under the quietest environmental conditions, acoustic patterns vary due to differences in specific speaker anatomy, phonetic context (coarticulation), and affect of the speaker. In order to perceive this signal in a linguistically appropriate manner, listeners must be able to accomplish two complementary perceptual feats. They must discriminate the acoustic variance that is linguistically relevant and generalize across the variance that is irrelevant. That is, the perceiver must categorize the incoming speech sounds in a manner specific to their language. In some sense, the process of assigning sounds to categories is the essence of speech perception. Of course, as remarkable as this accomplishment is for the adult native-language speaker, the assignment of sounds to categories by a child learning a language is even more awe-inspiring. Children must form these categories from the mire of the speech signal without even an indication of how many categories exist in the language thrust upon them. It has been estimated that there are 869 different phonemes utilized across the world’s languages (Maddieson, 1984). The size of phonemic inventories varies from as few as 11 to as many as 141! The infant is not supplied with this information, but must still carry on with the tasks of discriminating and generalizing across the acoustic input in a languageappropriate fashion. In the first 6 months, infants demonstrate the ability to discriminate many of the contrasts used in language (see Jusczyk, 1997, for a review). In fact, there is evidence based on cardiac deceleration for fetal discrimination of vowel sounds (LeCanuet & Granier-Deferre, 1993). In terms of generalization, 6-month-old infants demonstrate the ability to ignore variance due to speaker change (Hillenbrand, 1983, 1984; Kuhl, 1979, 1983). Between 8 and 12 months of age, infants stop making discriminations for non-native contrasts (e.g., Werker & Tees, 1984). That is, infants begin to generalize across acoustic differences that do not distinguishing among phoneme categories in their native language (Best, 1995; Holt, Lotto, & Kluender, 1998). Thus, the dual tasks of speech sound discrimination and generalization appear to be well underway by the time the infant speaks his/her first word. In the last 50 years of empirical work on speech perception, much of the effort has been focused on examining these twin abilities of discrimination and generalization of acoustic variance in relation to phonemes. For example, one of the most debated and examined phenomena in speech perception research is categorical perception (Liberman, Harris, Hoffman, & Griffith, 1957; Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967; Harnad, 1987). In typical demonstrations of categorical perception, a series of speech sounds that varies in perceived phonemic label (e.g., from /da/ to /ga/) is synthesized. These sounds are presented to listeners in identification and discrimination tasks. Identification functions obtained from labeling each stimulus as an example of a particular phoneme tend to have a rather steep slope with few exemplars perceived as ambiguous. In the discrimination tasks, listeners have little trouble distinguishing exemplars that differ in phonemic label (e.g., easily responding to the difference between a /da/ and a /ga/), but they have great difficulty discriminating exemplars assigned the same label (e.g., problems discriminating one /da/ from another /da/). This pattern holds even when the two pairs are equated for physical difference (e.g., equal changes in the onset frequency of the third formant). That is, listeners discriminate across phonetic boundaries and generalize within a phonetic category. This pattern of responses has been demonstrated for a number of phonemic contrasts (see Repp, 1984, for review). These results have led researchers to suggest that the grain of speech perception is at the level of the phonetic category. Structure of Phonetic Categories 4 Despite this empirical focus on the phonetic category, the field of speech perception has traditionally not taken these categories seriously qua categories. That is, prior to the last decade and a half, there were few attempts to uncover the structure of these categories and relate them to the voluminous research on other conceptual and perceptual categories. There are at least three reasons for this disconnection between research in speech perception and general categorization. First, the literature on general perceptual categorization has tended to focus on visual categories. These visual categories often are defined by the presence or absence of discrete features with few total exemplars in each category (e.g., Maddox & Ashby, 1993; Reed, 1996). Speech categories, in contrast, are auditory and are defined by continuous values across a number of imperfectly-valid attribute variables resulting in multitudes of possible category exemplars. As a result, it has been difficult to make clear and coherent predictions about speech categories from the existing work on general perceptual categorization. A second factor leading to the disengagement of speech perception research from the general categorization literature has been the claims that speech perception is a “special” perceptual process unlike general audition (e.g., Liberman et al., 1967; Liberman & Mattingly, 1985, 1989). From this view, there is little to be gained by utilizing results derived from general categorization studies in explicating the phenomena of speech perception. Speech perception is viewed as its own animal evolving specifically to handle the unique problems posed by the linguistic purpose of the signal. To the extent that this view has been orthodoxy in the field (see Trout, 2001, for a recent declaration of the view), there has been little impetus to study speech sound categories as general perceptual categories. The third barrier to the integration of speech perception and categorization has been the traditional focus of speech research on the phoneme (Lotto & Holt, 2000). This abstract, discrete unit has been borrowed from the descriptive toolbox of linguists. Given that speech serves a linguistic function and given the success that linguists have had in describing many phenomena across languages using the phoneme, the centrality of the phoneme in speech perception research makes good sense. Most theories of speech perception either explicitly express or tacitly imply that the phoneme (or some similar abstract discrete symbol) is the output of the process of speech perception. In concordance, many cognitive theories of word recognition presume that “higher levels” of language processing are given phonemic representations as input (e.g., Forster, 1976, 1979; Marslen-Wilson, 1984, 1987, 1990; McClelland & Elman, 1986). This concentration on the phonemic form has often been at the expense of interest in the acoustic variance within a phoneme class, i.e., the phonetic form (see Diehl, 1991; Ohala, 1990 for historical and critical reviews of this distinction). However, it is in the variance within phoneme classes that one can most easily discern the effects of the categorization processes. By devaluing the acoustic continua in favor of the discrete sign, the field has limited the importance of distinguishing between particular categorization models in speech perception. Researchers have often distinguished two possible modes of perception: the phonemic mode in which the perceiver has access only to the discrete linguistic labels and the auditory mode in which the perceiver has access to the continuous auditory attributes of the signal unfettered by phonemic status of the sound. Critical experiments have been designed to test whether speech perception is an example of a specific phonemic or a general auditory mode of perception or whether listeners can switch between these perceptual modes depending on the task (e.g., Pisoni & Lazarus, 1974; Ganong, 1980; Werker & Logan, 1985). This previous work excluded a third possibility; namely, that normal speech perception relies neither on raw auditory representations nor on discrete labels but on continuous representations that are perturbed and constrained by experience with linguistic categories. That is, phonetic categorization affects the output of general audition, but does not replace it. Recently, researchers in speech perception have come to recognize that phonetic categorization does leave fingerprints on the auditory representation Structure of Phonetic Categories 5 of speech sounds, and the results are something less than the discrete segmentation of acoustic variance into phonemes. Examining Within-Phoneme Variance: Fingerprints of Categorization As stated earlier, speech perception research has traditionally been dominated by the notion that speech sounds are perceived categorically, i.e. only exemplars that differ in category label are discriminated. Whereas it is true that category status has a significant effect on discrimination performance, it has long been known that perceived category membership does not perfectly predict discrimination performance. For consonants, discrimination performance is usually higher than predictions derived solely from identification functions (Fujisaki & Kawashima, 1969, 1970; Macmillan, 1987). In fact, with training or with more sensitive testing procedures, listeners can become quite good at discriminating within category differences (Carney, Widin, & Viemeister, 1977; Kewley-Port, Watson, & Foyle, 1988). For vowels, the lack of discrete categorical perception is even clearer. Discrimination of acoustic differences from within a single vowel category tends to be quite good, though the superior discriminability of between-category comparisons is present for vowels as well (Pisoni, 1973). Thus, phonetic category membership has extensive effects upon the perception of speech sounds but the sounds retain some of their acoustic individuality. In the last decade and a half, research interest in within-category distinctions has increased and it has become clear that not all exemplars of a phonetic category are created equal. Several studies have asked listeners to make explicit judgments about the “goodness” of an exemplar as the member of a particular phonetic category (e.g., Massaro & Cohen, 1983; Miller & Volaitis, 1989; Volaitis & Miller, 1992; Miller, 1994; Hodgson & Miller, 1996; Iverson & Kuhl, 1996; Kluender, Lotto, Holt & Bloedel, 1998). As is the case with other categories (Rosch, 1975), the data demonstrate that phonetic categories have an internal structure with some exemplars rated as better or more typical members of the category than others. Differences between same-category speech sounds have also been demonstrated in a variety of other empirical paradigms. Speech tokens differ in their effectiveness at inducing identification shifts in selective adaptation studies (McNabb, 1974; Miller, Connine, Schermer, & Kluender, 1983; Samuel, 1982). Some exemplars serve as better competitors in dichotic competition experiments (Miller, 1977; Repp, 1977). Thus, in terms of perceptual efficacy, phonetic categories appear not to be monolithic structures processed solely in terms of their phonemic label. Instead, these categories are complex entities in themselves, with a rich internal structure. This internal structure provides a finer-grained fingerprint of the underlying processes of categorization than does the simple identification function favored in most studies of categorical perception. In the general field of categorization, it has been the internal structure of categories that has provided the most fertile testing bed for competing models of categorization (Knapp & Anderson, 1984; Medin & Smith, 1981). Perhaps by examining more closely the structure of phonetic categories, we can come to a greater understanding of how humans accomplish the dual tasks of discrimination and generalization in normal speech perception (Massaro, 1987). In addition, we may be able to gain insight into the formation of these categories by infants and second language learners. The work of Werker and her colleagues (e.g., Werker & Tees, 1984) has demonstrated that infants are generalizing and discriminating speech sounds in a native-language-appropriate manner by the end of their first year of life. But, are the structures of their categories adult-like? Walley and Flege (1999) present evidence for continual development and formation of the internal structure of these categories beyond infancy. They asked 5-year-old, 9-year-old and adult native English speakers to identify vowels from several series. Whereas there was no marked shift in phoneme boundaries related to age, the slopes of the identification functions increased with age suggesting continued refinement of phonetic
منابع مشابه
Lexically guided retuning of visual phonetic categories.
Listeners retune the boundaries between phonetic categories to adjust to individual speakers' productions. Lexical information, for example, indicates what an unusual sound is supposed to be, and boundary retuning then enables the speaker's sound to be included in the appropriate auditory phonetic category. In this study, it was investigated whether lexical knowledge that is known to guide the ...
متن کاملThe Perception of Voice Onset Time: An fMRI Investigation of Phonetic Category Structure
This study explored the neural systems underlying the perception of phonetic category structure by investigating the perception of a voice onset time (VOT) continuum in a phonetic categorization task. Stimuli consisted of five synthetic speech stimuli which ranged in VOT from 0 msec ([da]) to 40 msec ([ta]). Results from 12 subjects showed that the neural system is sensitive to VOT differences ...
متن کاملInferior frontal regions underlie the perception of phonetic category invariance.
The problem of mapping differing sensory stimuli onto a common category is fundamental to human cognition. Listeners perceive stable phonetic categories despite many sources of acoustic variability. What are the neural mechanisms that underlie this perceptual stability? In this functional magnetic resonance imaging study, a short-interval habituation paradigm was used to investigate neural sens...
متن کاملLearning words' sounds before learning how words sound: 9-month-olds use distinct objects as cues to categorize speech information.
One of the central themes in the study of language acquisition is the gap between the linguistic knowledge that learners demonstrate, and the apparent inadequacy of linguistic input to support induction of this knowledge. One of the first linguistic abilities in the course of development to exemplify this problem is in speech perception: specifically, learning the sound system of one's native l...
متن کاملمعرفی شبکه های عصبی پیمانه ای عمیق با ساختار فضایی-زمانی دوگانه جهت بهبود بازشناسی گفتار پیوسته فارسی
In this article, growable deep modular neural networks for continuous speech recognition are introduced. These networks can be grown to implement the spatio-temporal information of the frame sequences at their input layer as well as their labels at the output layer at the same time. The trained neural network with such double spatio-temporal association structure can learn the phonetic sequence...
متن کاملLearning Phonetic Categories by Learning a Lexicon
Infants learn to segment words from fluent speech during the same period as they learn native language phonetic categories, yet accounts of phonetic category acquisition typically ignore information about the words in which speech sounds appear. We use a Bayesian model to illustrate how feedback from segmented words might constrain phonetic category learning, helping a learner disambiguate over...
متن کامل